智能论文笔记

End-to-End Automatic Speech Recognition model for the Sudanese Dialect

Ayman Mansour , Wafaa F. Mukhtar

分类：自然语言处理

2022-12-21

Designing a natural voice interface rely mostly on Speech recognition for interaction between human and their modern digital life equipment. In addition, speech recognition narrows the gap between monolingual individuals to better exchange communication. However, the field lacks wide support for several universal languages and their dialects, while most of the daily conversations are carried out using them. This paper comes to inspect the viability of designing an Automatic Speech Recognition model for the Sudanese dialect, which is one of the Arabic Language dialects, and its complexity is a product of historical and social conditions unique to its speakers. This condition is reflected in both the form and content of the dialect, so this paper gives an overview of the Sudanese dialect and the tasks of collecting represented resources and pre-processing performed to construct a modest dataset to overcome the lack of annotated data. Also proposed end- to-end speech recognition model, the design of the model was formed using Convolution Neural Networks. The Sudanese dialect dataset would be a stepping stone to enable future Natural Language Processing research targeting the dialect. The designed model provided some insights into the current recognition task and reached an average Label Error Rate of 73.67%.

translated by 谷歌翻译

Dialog2API: Task-Oriented Dialogue with API Description and Example Programs

Raphael Shu , Elman Mansimov , Tamer Alkhouli , Nikolaos Pappas , Salvatore Romeo , Arshit Gupta , Saab Mansour , Yi Zhang , Dan Roth

分类：自然语言处理

2022-12-20

Functionality and dialogue experience are two important factors of task-oriented dialogue systems. Conventional approaches with closed schema (e.g., conversational semantic parsing) often fail as both the functionality and dialogue experience are strongly constrained by the underlying schema. We introduce a new paradigm for task-oriented dialogue - Dialog2API - to greatly expand the functionality and provide seamless dialogue experience. The conversational model interacts with the environment by generating and executing programs triggering a set of pre-defined APIs. The model also manages the dialogue policy and interact with the user through generating appropriate natural language responses. By allowing generating free-form programs, Dialog2API supports composite goals by combining different APIs, whereas unrestricted program revision provides natural and robust dialogue experience. To facilitate Dialog2API, the core model is provided with API documents, an execution environment and optionally some example dialogues annotated with programs. We propose an approach tailored for the Dialog2API, where the dialogue states are represented by a stack of programs, with most recently mentioned program on the top of the stack. Dialog2API can work with many application scenarios such as software automation and customer service. In this paper, we construct a dataset for AWS S3 APIs and present evaluation results of in-context learning baselines.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems

Denis Emelin , Daniele Bonadiman , Sawsan Alqahtani , Yi Zhang , Saab Mansour

分类：自然语言处理 | 人工智能

2022-12-15

Pre-trained language models (PLM) have advanced the state-of-the-art across NLP applications, but lack domain-specific knowledge that does not naturally occur in pre-training data. Previous studies augmented PLMs with symbolic knowledge for different downstream NLP tasks. However, knowledge bases (KBs) utilized in these studies are usually large-scale and static, in contrast to small, domain-specific, and modifiable knowledge bases that are prominent in real-world task-oriented dialogue (TOD) systems. In this paper, we showcase the advantages of injecting domain-specific knowledge prior to fine-tuning on TOD tasks. To this end, we utilize light-weight adapters that can be easily integrated with PLMs and serve as a repository for facts learned from different KBs. To measure the efficacy of proposed knowledge injection methods, we introduce Knowledge Probing using Response Selection (KPRS) -- a probe designed specifically for TOD models. Experiments on KPRS and the response generation task show improvements of knowledge injection with adapters over strong baselines.

translated by 谷歌翻译

Differentially-Private Bayes Consistency

Olivier Bousquet , Haim Kaplan , Aryeh Kontorovich , Yishay Mansour , Shay Moran , Menachem Sadigurschi , Uri Stemmer

分类：机器学习 | (统计)机器学习

2022-12-08

We construct a universally Bayes consistent learning rule that satisfies differential privacy (DP). We first handle the setting of binary classification and then extend our rule to the more general setting of density estimation (with respect to the total variation metric). The existence of a universally consistent DP learner reveals a stark difference with the distribution-free PAC model. Indeed, in the latter DP learning is extremely limited: even one-dimensional linear classifiers are not privately learnable in this stringent model. Our result thus demonstrates that by allowing the learning rate to depend on the target distribution, one can circumvent the above-mentioned impossibility result and in fact, learn \emph{arbitrary} distributions by a single DP algorithm. As an application, we prove that any VC class can be privately learned in a semi-supervised setting with a near-optimal \emph{labeled} sample complexity of $\tilde{O}(d/\varepsilon)$ labeled examples (and with an unlabeled sample complexity that can depend on the target distribution).

translated by 谷歌翻译

Counterfactual Optimism: Rate Optimal Regret for Stochastic Contextual MDPs

Orin Levy , Asaf Cassel , Alon Cohen , Yishay Mansour

分类：机器学习

2022-11-27

We present the UC$^3$RL algorithm for regret minimization in Stochastic Contextual MDPs (CMDPs). The algorithm operates under the minimal assumptions of realizable function class, and access to offline least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys an $\widetilde{O}(H^3 \sqrt{T |S| |A|(\log (|\mathcal{F}|/\delta) + \log (|\mathcal{P}|/ \delta) )})$ regret guarantee, with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, and $\mathcal{P}$ and $\mathcal{F}$ are finite function classes, used to approximate the context-dependent dynamics and rewards, respectively. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs, which operates under the general offline function approximation setting.

translated by 谷歌翻译

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

Soumajyoti Sarkar , Kaixiang Lin , Sailik Sengupta , Leonard Lausen , Sheng Zha , Saab Mansour

分类：自然语言处理 | 机器学习

2022-11-08

The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of sufficient monolingual dialectal data and parallel translation data of such dialectal variants. It remains an open problem on whether the limited dialectical data can be used to improve the models trained in Arabic on its dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model and beat existing models (by an avg metric of +$6.41$). We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function. We show that both approaches help improve performance on dialectal classification tasks ($+4.64$ avg. gain) when used on monolingual models.

translated by 谷歌翻译

BDSL 49: A Comprehensive Dataset of Bangla Sign Language

Ayman Hasib , Saqib Sizan Khan , Jannatul Ferdous Eva , Mst. Nipa Khatun , Ashraful Haque , Nishat Shahrin , Rashik Rahman , Hasan Murad , Md. Rajibul Islam , Molla Rashied Hussein

分类：计算机视觉

2022-08-14

语言是个人表达思想的方法。每种语言都有自己的字母和数字字符集。人们可以通过口头或书面交流相互交流。但是，每种语言都有同类语言。聋哑和/或静音的个人通过手语交流。孟加拉语还具有手语，称为BDSL。数据集是关于孟加拉手册图像的。该系列包含49个单独的孟加拉字母图像。 BDSL49是一个数据集，由29,490张具有49个标签的图像组成。在数据收集期间，已经记录了14个不同成年人的图像，每个人都有不同的背景和外观。在准备过程中，已经使用了几种策略来消除数据集中的噪声。该数据集可免费提供给研究人员。他们可以使用机器学习，计算机视觉和深度学习技术开发自动化系统。此外，该数据集使用了两个模型。第一个是用于检测，而第二个是用于识别。

translated by 谷歌翻译

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Liad Erez , Tal Lancewicki , Uri Sherman , Tomer Koren , Yishay Mansour

分类：机器学习 | 人工智能 | (统计)机器学习

2022-07-28

最近有很多不可能的结果表明，在与对抗对手的马尔可夫游戏中最小化的遗憾在统计学上和计算上是棘手的。然而，这些结果都没有排除在所有各方采用相同学习程序的假设下，遗憾最小化的可能性。在这项工作中，我们介绍了第一种（据我们所知）在通用马尔可夫游戏中学习的算法，该算法在所有代理商执行时提供了sublinear后悔保证。我们获得的边界是为了置换遗憾，因此，在此过程中，意味着融合了相关的平衡。我们的算法是分散的，计算上有效的，并且不需要代理之间的任何通信。我们的主要观察结果是，在马尔可夫游戏中通过策略优化的在线学习基本上减少了一种加权遗憾的最小化形式，而未知权重由代理商的策略顺序的路径长度确定。因此，控制路径长度会导致加权的遗憾目标，以提供足够的适应性算法提供统一的后悔保证。

translated by 谷歌翻译

Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

Orin Levy , Yishay Mansour

分类：机器学习

2022-07-22

我们使用访问离线最小二乘回归甲骨文的访问权限，在最低可及性假设下为随机上下文MDP提供了遗憾的最小化算法。我们分析了三个不同的设置：在该动力学的位置，动力学是未知的，但独立于上下文和最具挑战性的设置，而动力学是未知和上下文依赖性的。对于后者，我们的算法获得$ \ tilde {o} \ left（\ max \ {h，{1}/{p_ {min}}} \} \} t \ log（\ max \ {| \ mathcal {f} |，| \ mathcal {p} | \}/\ delta）} \ right）$ hearse bunder bund bund bund bund bund bund bund bunging bund bunger，probinality $ 1- \ delta $，其中$ \ mathcal { P} $和$ \ Mathcal {f} $是用于分别近似动态和奖励的有限且可实现的函数类，$ p_ {min} $是最小可及性参数，$ s $是一组状态，$ a $ a $一组动作，$ h $ the Horizon和$ t $情节数。据我们所知，我们的方法是使用一般函数近似的上下文MDP的第一种乐观方法（即，在没有其他有关功能类别的知识的情况下，例如线性等）。此外，我们还提供$ \ omega的下限即使在已知的动态情况下，也会产生预期的遗憾。

translated by 谷歌翻译